Dyna-H: a heuristic planning reinforcement learning algorithm applied to role-playing-game strategy decision systems
نویسندگان
چکیده
In a Role-Playing Game, finding optimal trajectories is one of the most important tasks. In fact, the strategy decision system becomes a key component of a game engine. Determining the way in which decisions are taken (online, batch or simulated) and the consumed resources in decision making (e.g. execution time, memory) will influence, in mayor degree, the game performance. When classical search algorithms such as A∗ can be used, they are the very first option. Nevertheless, such methods rely on precise and complete models of the search space, and there are many interesting scenarios where their application is not possible. Then, model free methods for sequential decision making under uncertainty are the best choice. In this paper, we propose a heuristic planning strategy to incorporate the ability of heuristic-search in path-finding into a Dyna agent. The proposed Dyna-H algorithm, as A∗ does, selects branches more likely to produce outcomes than other branches. Besides, it has the advantages of being a modelfree online reinforcement learning algorithm. The proposal was evaluated against the one-step Q-Learning and Dyna-Q algorithms obtaining excellent experimental results: Dyna-H significatively overcomes both methods in all experiments. We suggest also, a functional analogy between the proposed sampling from worst trajectories heuristic and the role of dreams (e.g. nightmares) in human behavior.
منابع مشابه
Wp-dyna: Planning and Reinforcement Learning in Well-plannable Environments
Reinforcement learning (RL) involves sequential decision making in uncertain environments. The aim of the decision-making agent is to maximize the benefit of acting in its environment over an extended period of time. Finding an optimal policy in RL may be very slow. To speed up learning, one often used solution is the integration of planning, for example, Sutton’s Dyna algorithm, or various oth...
متن کاملIntegrated Architectures for Learning , Planning , and ReactingBased
This paper extends previous work with Dyna, a class of architectures for intelligent systems based on approximating dynamic programming methods. Dyna architectures integrate trial-and-error (reinforcement) learning and execution-time planning into a single process operating alternately on the world and on a learned model of the world. In this paper, I present and show results for two Dyna archi...
متن کاملReinforcement Learning–Based Energy Management Strategy for a Hybrid Electric Tracked Vehicle
This paper presents a reinforcement learning (RL)–based energy management strategy for a hybrid electric tracked vehicle. A control-oriented model of the powertrain and vehicle dynamics is first established. According to the sample information of the experimental driving schedule, statistical characteristics at various velocities are determined by extracting the transition probability matrix of...
متن کاملIntegrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming
This paper extends previous work with Dyna a class of architectures for intelligent systems based on approximating dynamic program ming methods Dyna architectures integrate trial and error reinforcement learning and execution time planning into a single process operating alternately on the world and on a learned model of the world In this paper I present and show results for two Dyna archi tect...
متن کاملCompetitive Reinforcement Learning for Combinatorial Problems
This paper shows that the competitive learning rule found in Learning Vector Quantization (LVQ) serves as a promising function approximator to enable reinforcement learning methods to cope with a large decision search space, defined in terms of different classes of input patterns, like those found in the game of Go. In particular, this paper describes S[arsa]LVQ, a novel reinforcement learning ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Knowl.-Based Syst.
دوره 32 شماره
صفحات -
تاریخ انتشار 2012